Intro

In this notebook we will compare the biomarker results from 8 different cell lines. These results were obtained through an ensemble model analysis, which was performed on the models generated by the Gitsbe module when running the DrugLogics computational pipeline for finding synergistic drug combinations (drug pairs). The models in each different cell line setup were trained towards a specific signaling activity profile matching the activity state of that particular cell line. For more info about the cell line specific model & biomarker analysis, see the respective notebooks:

Note that the biomarkers identified in the aforementioned notebooks, were classified as:

  • Performance-related biomarkers, which represent important nodes that make our models achieve a higher model performance classification (the methodology was based on the number of true positives (\(TP\)) and the Mathews Correlation Coefficient score (\(MCC\))
  • Per synergy predicted biomarkers, which represent important nodes that make our models predict some of the synergies that were observed in the lab

In each category, the biomarkers can be either in an active state or in an inhibited state. The input for the simulations and the output data are in the cell-lines-2500 directory (the 2500 number denotes the number of simulations executed). The analysis will be presented step by step in the sections below.

Prerequisites

Firstly, we load the required libraries (you need to install them if you don’t have them):

library(circlize) # version 0.4.5, see (Gu, 2014)
library(ComplexHeatmap) # version 1.99.5, see (Gu, 2016)

and the relevant helper functions:

# Set the working directory to the gitsbe-model-analysis folder: 
# setwd("pathTo/gitsbe-model-analysis")
source("Rscripts/input_functions.R")
source("Rscripts/output_functions.R")
source("Rscripts/analysis_functions.R")

The R version used for this analysis is:

pretty.print.string(R.version.string)

R version 3.5.2 (2018-12-20)

Input

Firstly, we define the necessary input (cell line names, directories and files of interest, etc.):

cell.lines = c("A498", "AGS", "colo205", "DU145", "MDA-MB-468", "SF295", 
               "SW620", "UACC62")

cell.lines.dirs = sapply(cell.lines, function(cell.line) {
  paste0(getwd(), "/cell-lines-2500/", cell.line)
})

biomarkers.dirs = sapply(cell.lines.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/biomarkers")
})

observed.synergies.files = sapply(cell.lines.dirs, function(cell.line.dir) {
  paste0(cell.line.dir, "/observed_synergies")
})

# get the drug combinations tested
model.predictions.file = paste0(cell.lines.dirs[1], "/model_predictions")
model.predictions = get.model.predictions(model.predictions.file)
drug.combinations.tested = colnames(model.predictions)

# get the node names (same topology for all cell lines)
models.dir = paste0(cell.lines.dirs[1], "/models")
node.names = get.node.names(models.dir)

Performance Biomarkers

Next, we will visualize the performance biomarkers found from the cell-specific model analysis, per cell line:

biomarkers.perf.active = 
  get.perf.biomarkers.per.cell.line(biomarkers.dirs, type = "active")

biomarkers.perf.inhibited =
  get.perf.biomarkers.per.cell.line(biomarkers.dirs, type = "inhibited")

biomarkers.perf.res = 
  merge.perf.biomarkers(node.names, cell.lines, biomarkers.perf.active, 
                        biomarkers.perf.inhibited)

# remove nodes which are not biomarkers for any cell line
biomarkers.perf.res = prune.columns.from.df(biomarkers.perf.res, value = 0)

# define a coloring
biomarkers.col.fun = colorRamp2(c(-1, 0, 1), c("tomato", "grey", "gold"))

perf.biomarkers.heatmap = 
  Heatmap(matrix = as.matrix(biomarkers.perf.res), col = biomarkers.col.fun,
        column_title = "Performance biomarkers per cell line",
        column_title_gp = gpar(fontsize = 20),
        row_title = "Cell Lines", row_title_side = "left",
        row_dend_side = "right", row_names_side = "left",
        column_names_gp = gpar(fontsize = 5),
        rect_gp = gpar(col = "black", lwd = 0.3),
        heatmap_legend_param = list(at = c(-1, 1), title = "Activity State",
          labels = c("Inhibited", "Active"), color_bar = "discrete", ncol = 2,
          title_position = "leftcenter", direction = "horizontal"))

draw(perf.biomarkers.heatmap, heatmap_legend_side = "bottom")

  • Across all cell lines, a total of 91 performance biomarkers were found
  • The three gastrointestinal cell lines (AGS,SW620,colo205) have a lot of common biomarkers. These common biomarkers between the cell lines can be in either the same activity state (SW620 vs AGS) or the reverse (SW620 vs colo205)
  • In general the biomarker results are different between the different type of cell lines
  • There exist unique biomarkers per cell line which are not shared with any of the other cell lines

Observed synergies

Each of the cell lines studied has a different set of observed synergies (drug combinations that were found synergistic across all the 153 tested ones). In this section, we will visualize the cell lines’ observed synergies and mark the synergies that were also predicted by the cell-specific models. First, we get the biomarkers for these synergies from each cell line:

synergy.biomarkers.cell.specific = 
  get.synergy.biomarkers.per.cell.line(biomarkers.dirs)

total.predicted.synergies.cell.specific = character(0)
for (cell.line in cell.lines) {
  total.predicted.synergies.cell.specific = 
    c(total.predicted.synergies.cell.specific, 
      rownames(synergy.biomarkers.cell.specific[[cell.line]]))
}
total.predicted.synergies.cell.specific = 
  unique(total.predicted.synergies.cell.specific)
total.predicted.synergies.cell.specific.num = 
  length(total.predicted.synergies.cell.specific)

Then, we get the observed synergies from each cell line:

observed.synergies.per.cell.line = sapply(
  observed.synergies.files, function(observed.synergies.file) {
    get.observed.synergies(observed.synergies.file)
})

observed.synergies.res = 
  merge.observed.synergies(drug.combinations.tested, cell.lines, 
                           observed.synergies.per.cell.line)

# remove drug combinations which are not observed in any of the cell lines
observed.synergies.res = prune.columns.from.df(observed.synergies.res, value = 0)

total.observed.synergies = colnames(observed.synergies.res)
total.observed.synergies.num = length(total.observed.synergies)

Lastly, we visualize the observed and predicted synergies for all cell lines in one heatmap:

# color the cell-specific predicted synergies
predicted.synergies.colors = rep("black", total.observed.synergies.num)
names(predicted.synergies.colors) = total.observed.synergies
predicted.synergies.colors[total.observed.synergies %in% 
                           total.predicted.synergies.cell.specific] = "blue"

# define a coloring
obs.synergies.col.fun = colorRamp2(c(0, 1), c("red", "green"))

observed.synergies.heatmap = 
  Heatmap(matrix = as.matrix(observed.synergies.res), 
          col = obs.synergies.col.fun,
          column_title = "Observed synergies per cell line",
          column_title_gp = gpar(fontsize = 20),
          column_names_gp = gpar(col = predicted.synergies.colors),
          row_title = "Cell Lines", row_title_side = "left",
          row_dend_side = "right", row_names_side = "left",
          rect_gp = gpar(col = "black", lwd = 0.3),
          heatmap_legend_param = list(at = c(1, 0), labels = c("YES", "NO"), 
            color_bar = "discrete", title = "Observed", direction = "vertical"))

lgd = Legend(at = "Cell-specific", title = "Predicted", 
             legend_gp = gpar(fill = "blue"))

draw(observed.synergies.heatmap,  heatmap_legend_list = list(lgd), 
     heatmap_legend_side = "right")

  • The cell-specific models predicted 12 of the 40 observed synergies found across the 8 cell lines. Thus, the total true positive coverage for the cell-specific modesl across all cell lines is 0.3%
  • Note that there exist synergies which were observed in all cell lines (AK-BI, PI-D1)
  • Cell line specific synergies (e.g. PD-PI, AK-PD in the A498 cell line) were identified only by the cell-specific models (????? MAYBE! HAVE TO CHECK FIRST WITH RANDOM MODEL PREDICTIONS)

Synergy Biomarkers

In this section, we will produce heatmaps showing the biomarkers across the 8 cell lines for every observed synergy that was also predicted by the cell-specific models. First, we manipulate the biomarker result data to fit our purposes:

synergy.biomarkers.cell.specific.res = 
  arrange.by.synergy(synergy.biomarkers.cell.specific, observed.synergies.res, 
                     total.predicted.synergies.cell.specific, node.names)

# For every synergy, remove cell lines that didn't predict the observed synergies
# at all or for which we couldn't find any biomarkers (row pruning).
# Also remove nodes which are not biomarkers for any cell line (column pruning)
for (synergy in names(synergy.biomarkers.cell.specific.res)) {
  df = synergy.biomarkers.cell.specific.res[[synergy]]
  df = prune.columns.from.df(df, value = 0)
  df = prune.rows.from.df(df, value = 0)
  synergy.biomarkers.cell.specific.res[[synergy]] = df
}

# re-order result list based on increasing number of cell lines
synergy.biomarkers.cell.specific.res = synergy.biomarkers.cell.specific.res[
  names(sort(sapply(synergy.biomarkers.cell.specific.res, function(x) { nrow(x) })))
]

Next, we produce the heatmaps (one per synergy predicted):

for (synergy in names(synergy.biomarkers.cell.specific.res)) {
  biomarkers.res = as.matrix(synergy.biomarkers.cell.specific.res[[synergy]])
  
  # customize some parameters
  heatmap.height = ifelse(nrow(biomarkers.res) < 3, 1, 3)
  show.column.dend = ifelse(nrow(biomarkers.res) == 1, FALSE, TRUE)
  row.title = ifelse(nrow(biomarkers.res) == 1, "Cell Line", "Cell Lines")
  
  biomarkers.heatmap = 
    Heatmap(matrix = biomarkers.res, col = biomarkers.col.fun, 
            column_title = paste0("Biomarkers for Synergy: ", synergy),
            column_title_gp = gpar(fontsize = 20, fontface = "bold"),
            row_title = row.title, row_title_side = "right",
            rect_gp = gpar(col = "black", lwd = 0.3),
            column_names_gp = gpar(fontsize = 10),
            show_column_dend = show.column.dend,
            height = unit(heatmap.height, "inches"),
            heatmap_legend_param = list(at = c(-1, 1), title = "Activity State",
              labels = c("Inhibited", "Active"), color_bar = "discrete",
              title_position = "leftcenter", direction = "horizontal", ncol = 2))
  
  draw(biomarkers.heatmap, heatmap_legend_side = "bottom")
}

The main result here is that we observe common biomarkers across many cell lines and in the same activity state for some synergies, which hints that these biomarkers may have a pan-cancer diagnostic value.

References

  1. Zuguang Gu, Lei Gu, Roland Eils, Matthias Schlesner, Benedikt Brors, circlize Implements and enhances circular visualization in R. Bioinformatics (Oxford, England) 2014. PubMed
  2. Zuguang Gu, Roland Eils and Matthias Schlesner, Complex heatmaps reveal patterns and correlations in multidimensional genomic data, Bioinformatics, 2016